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ABSTRACT 

Algorithms  are  presented  which  find  one  or  all  of  the  critical  points  of  a 
smooth  function  In  a  rectangular  region,  or  the  critical  points  at  which  the 
function  has  maximum  or  minimum  value.  If  no  critical  points  of  the  function 
exist  in  the  given  region,  then  the  algorithm  verifies  this  fact.  The 
computation  is  self -validating,  in  that  the  existence  or  nonexistence  of 
critical  points  is  established  conclusively,  and  guaranteed  upper  and  lower 
bounds  are  confuted  for  all  quantities  of  interest,  including  the  values  of  the 
gradient  vector  and  Hessian  matrix  of  the  function.  The  algorithm  makes  use  of 
an  existing  implementation  of  automatic  differentiation  and  interval 
computation.  Numerical  examples  are  given. 
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SIGNIFICANCE  AND  EXPLANATION 


A  standard  problem  in  scientific  computation  is  the  optimization  of  a 
smooth  (at  least  twice  differentiable)  function,  possibly  subject  to  smooth 
constraints*  Here,  optimization  means  finding  the  maximum  or  minimum  value  of 
the  function  in  some  region,  or  perhaps  its  stationary  points.  The  algorithm 
presented  in  this  paper  solves  this  problem  for  regions  which  are  rectangular  in 
form.  The  method  used  is  a  version  of  interval  iteration  which  finds  one  or  all 
of  the  critical  points  of  the  function  in  the  given  region  (the  point  or  points 
at  which  its  gradient  vector  vanishes ) ,  or  just  the  critical  points  at  which  the 
function  has  its  maximum  or  minimum  values.  The  required  values  of  the  gradient 
vector  and  Hessian  matrix  of  the  function  are  obtained  by  automatic 
differentiation,  which  does  not  involve  symbolic  differentiation  or  numerical 
approximation  of  the  derivatives  of  the  function  bging  investigated.  Unlike 
algorithms  which  sample  function  values  only  at  a  discrete  set  of  points,  the 
ones  given  here  use  interval  confutation  to  furnish  guaranteed  bounds  for  all 
quantities  of  interest  over  the  required  subregions  of  the  initial  region.  ^The 
results  given  by  these  optimization  algorithms  are  thus  self-validating.  The 
algorithms  presented  here  have  been  programmed  using  standard  implementations  of 
automatic  differentiation  and  interval  computation.  Numerical  examples  are 
given  to  illustrate  the  effectiveness  of  this  approach  to  the  type  of 
optimization  problem  considered. 


The  responsibility  for  the  wording  and  views  expressed  in  this  descriptive 
summary  lies  with  MRC,  and  not  with  the  author  of  this  report. 


GLOBAL  OPTIMIZATION  USING  AUTOMATIC  DIFFERENTIATION 

AND  INTERVAL  ITERATION 


L.  B.  Rail  , 

\ 

1.  Preliminaries 

This  paper  presents  an  algorithm  for  global,  unconstrained  optimization  of  a  smooth  (at 
least  twice  differentiable)  function  /  :  Rn  — +  R,  that  is, 

(1.1)  f{x)  =  f(x  i,x2,...,xn). 

As  is  well-understood,  this  also  includes  the  case  of  optimization  of  a  function  <f> :  Rm  — 1 ►  R 
subject  ton-m  smooth  constraints 

(1.2)  ft(*l,zj,...)im)=0,  »  =  l,2,...,n  -  m, 

by  formation  of  the  function 

n-m 

(1.3)  /(z)  =  Xm)  ^  *  Im+i  ‘ 

*=1 

where  the  new  ^-ariables  xm+i,  •  • . ,  xn  are  simply  the  Lagrange  multipliers  for  the  problem. 
No  special  properties  of  /,  such  as  convexity,  are  assumed. 

The  method  to  be  used  is  a  critical  point  method,  which  will  find  one  or  all  solutions 
of  the  system  of  equations 


or  just  the  critical  points  at  which  the  value  of  /  is  a  maximum  or  minimum  in  X.  Such 
points  will  be  called  critical  extremal  points  to  distinguish  them,  if  necessary,  from  non- 
critical  points  on  the  boundary  dX  of  X  at  which  /  might  attain  a  maximum  or  minimum 
value.  , 

The  algorithm  will  make  use  of  automatic  differentiation  [ll]  to  compute  the  gradient 
vector  V  f(x)  of  /  at  x  =  (ii,X2,  •  • .  ,xn)«  and  also  its  Hessian  matrix 


This  technique  will  be  combined  with  the  use  of  interval  arithmetic  and  interval  evaluation 
of  library  functions  [8]  in  order  to  compute  guaranteed  bounds  for  values  of  functions  and 
their  derivatives  over  the  region  of  interest.  The  result  will  be  an  automatic,  self-validating 
optimi/ation  algorithm. 

Automatic  differentiation  has  been  used,  at  least  in  a  restricted  form,  by  McCormick 
[6j  for  optimization  problems.  Interval  methods  have  been  applied  by  Hansen  [2],  [3]  and 
Hansen  and  Sengupta  [4]  to  global  optimization  problems,  including  constrained  problems. 
Although  the  basic  algorithm  given  below  is  for  unconstrained  problems,  the  ideas  pre¬ 
sented  by  Hansen  indicate  the  possibility  of  introducing  constraints  into  the  calculations. 
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2.  Automatic  Differentiation 

The  basic  idea  behind  automatic  differentiation  is  to  use  the  formula  or  subroutine  for 
the  evaluation  of  the  function  /  at  x  to  obtain  also  values  of  its  derivatives  at  the  same 
point.  This  is  done  by  the  introduction  of  a  new  representation  of  variables,  and  arithmetic 
operations  which  include  the  rules  for  differentiation.  The  resulting  computational  scheme 
is  simple  to  program  for  computers  [11],  [13j,  and  avoids  both  the  complexity  of  symbolic 
differentiation  and  the  inaccuracy  of  numerical  differentiation.  The  new  variables  are 
triples 

(2.1)  U  =  (u,u\ u"), 

where  u  €  R  is  a  real  number,  u'  6  Rn  is  an  n-dimensional  real  (column)  vector,  and 
u"  is  a  symmetric  real  n  x  n  matrix.  The  set  of  these  elements  will  be  denoted  by  Hn, 
and  each  u  6  Hn  is  said  to  be  of  type  HESSIAN.  A  variable  U  of  type  HESSIAN  will 
be  interpreted  in  the  following  way:  Its  first  component  u  will  represent  the  value  of  a 
real-valued  function  at  some  point  x  G  R”,  and  u'  and  u"  the  values  of  its  gradient  vector 
and  Hessian  matrix,  respectively,  at  the  same  point. 

It  is  obvious  that  Hn  forms  a  linear  space.  More  importantly,  all  the  standard  arith¬ 
metic  operations  can  be  defined  in  Hn: 

(2.2)  U  +  V  ~  (u,u\ u")  +  (v,  t/,  v ")  =  (u  +  v,u#  +  v',  u"  +  v"), 

(2.3)  U  -V  —  (u,u',u")  -  (v,v',v")  =  (u  —  v,u*  -  »/,  u"  -  v"), 

U  ■  V  =  (u,  u',  u")  •  ( v ,  v' ,  v") 

(2.4) 

T  T 

=  (u  •  r,  v  ■  v'  +  v  •  u' .  u  ■  v"  4-  u'v'  +  v'v'  +  v  •  u”), 

U/V  =  {u,u',u")/{v,  v',v") 

(U  v  •  u’  -  u  •  v’  v 2  •  v"  -  v  ■  (v'u,T  -f  u'v'T)  +  2 u  ■  v'v'T  -  uv  ■  v"  \ 

\  '  /  1  i  2  •  ~  ’  o  )  ? 

v  f  0. 


3 


The  above  definitions  implement  the  rules  for  evaluation  and  differentiation  of  sums, 
differences,  products,  and  quotients  of  functions  with  known  values  and  derivatives.  In 
irder  to  use  an  algorithm  for  evaluation  of  a  real  function  to  obtain  the  corresponding 
values  in  Hn,  it  is  necessary  to  be  able  to  represent  the  independent  variables  xt,  i  = 
l,2,...,n  and  constants  c  as  elements  of  Hn.  This  is  done  by  the  mapping 

(2.6)  Xi  f*  (x„e,,0), 

for  the  tth  independent  variable  x,,  where  t{  denotes  the  tth  unit  vector,  and  0  the  n  x  n 
Eero  matrix.  (0  will  be  used  to  denote  zero  vectors  and  matrices,  as  well  as  the  real  number 
sero.)  Similary,  constants  c  are  represented  by 

(2.7)  e  (e,0,0). 

ft  follows  that  calculation  of  the  value,  gradient  vector,  and  Hessian  matrix  of  a  rational 
function  can  be  done  simply  by  making  the  substitutions  (2.6)  and  (2.7),  and  applying  the 
rules  (2.2)-(2.5).  The  results  are  exact,  not  numerical  approximations,  and  are  obtained 
without  symbolics. 

In  actual  practice,  instead  of  using  the  representation  (2.7)  for  constants,  it  is  simpler 
to  define  a  mixed  arithmetic  between  elements  c  €  R  and  U  =  ( u,u',u ")  G  Hn  [13]: 


[2.8) 

c  +  U  =  U  +  c  =  (c  +  u,u\  u"), 

(2.9) 

c  -  U  =  (c  -  u,  -  u',- u"), 

[2.10) 

U  -  c  =  (u  -  c,u', u,;), 

[2-11) 

c  •  V  =  U  •  c  =  (c  •  u,c  ■  u\c  •  u"), 

(2.12) 

(c  c  ■  u'  2c  •  u'u'T  -  cu  ■  u"\ 

C  u  =  (  ,  ), 

Vu  u  *  u'5  / 

).  «  ^  0, 


•  .  •  „  ,  •  „  •,*,*.*  .  ♦  „  *  „  •  ■  >  *•„  . 
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For  example,  consider  the  two-dimensional  Rosenbrock  function  (f  1],  p.  95): 


(2.14) 


f(x)  =  100{l2  -I,)2+  (1  -Xl)2. 


In  order  to  evaluate  this  function  together  with  its  gradient  vector  and  Hessian  matrix  at 
the  point  x  -  (- 1.2, 1.0),  one  sets 


(2.15) 


X} 


and  evaluates  (2.14)  using  the  above  rules.  The  result  is 

(2.16)  (/(*),  V/M.w/M)  =  (24.2,  (-_2818506) ,  (12830°0°  2^0))’ 


which  is  exactly  what  one  would  get  by  differentiating  (2.14)  symbolically  and  then  eval¬ 
uating  the  results  for  Xj  =  -1.2,  X?  —  1.0  in  real  arithmetic. 

In  addit  ion  to  rational  functions  of  several  variables,  other  standard  functions  can  be 
defined  read i 1  y  on  H".  For  example, 


(2.17) 


sinf/  =  sin(u,  u',  u")  =  (sin  u,  cos  u  •  u^cosu  ■  u"  -  sinu 


/  rT \ 
u  u  ). 


In  general,  if  g  :  R  — >  R  is  twice  differentiable,  then  it  can  be  extended  immediately  to 
the  mapping  g  :  H”  — >  Hn  by  use  of  the  chain  rule: 

(2.18)  g(U)  =  g({u,u’,u"))  =  ( g(u),g'(u )  •  u',g'(u)  ■  u"  +  g”(u)  ■  u'u'T ), 

[ni,  in.. 

It  is  easy  to  program  automatic  differentiation  in  languages  such  as  Ada  and  Pascal- 
SC  :  13. ,  which  permit  introduction  of  data  types  and  additional  definitions  of  the  standard 
operator  symbols  to  manipulate  such  types.  (This  is  sometimes  called  “overloading”  the 
standaid  ooo*-ator  symbols.)  In  these  languages,  the  variables  j]  and  X2  in  (2.14)  would 


declared  to  be  of  type  HESSIAN,  along  with  the  result  /,  and  the  evaluation  would  be 
ried  out  on  the  basis  of  an  expression  of  the  same  form  as  (2.14).  In  ordinary  Pascal 
FORTRAN,  (2.14)  would  have  to  be  rewritten  as  a  sequence  of  calls  to  subroutines 
addition,  exponentiation,  etc.  (11  j.  The  algorithms  described  in  this  paper  have  been 
•grammed  in  Pascal-SC,  and  some  results  are  given  in  the  final  section. 

Interval  Computation 

ordinary  optimization  algorithms,  the  function  to  be  optimized  is  sampled  only  at  a 
crete  set  of  points.  This  can  result  in  the  loss  of  valuable  information  about  the  function, 
e  algorithms  presented  in  this  paper,  on  the  other  hand,  use  interval  computation,  which 
>duces  guaranteed  bounds  for  the  values  of  functions  and  their  derivatives  over  entire 
;ions  [8j.  This  prevents  the  process  from  being  misled  by  incomplete  information. 

The  basic  component  of  interval  computation  is  interval  arithmetic  [8].  Let  IR  denote 
;  set  of  bounded,  closed  intervals  on  the  real  line  R.  For  /  =  [a,  6]  6  IR,  J  =  [c,d]  e  IR, 
i  arithmetic  operations  are  defined  by 

1)  I*  J  =  [a,  6]  *  (c,d]  =  {x*y\xeI,y6J}  =  [r,s], 

iere  ★  6  and  division  by  an  interval  containing  0  is  excluded.  In  actual 

plementation  on  computers,  directed  rounding  is  used  (downward  for  lower  endpoints, 
ward  for  upper  endpoints),  so  the  actual  result  computed  is  [Vr,  As],  which  always 
itains  the  exact  result  jr,  s]  of  the  interval  operation. 

Evaluation  of  a  real  rational  function  /  :  R  — >  R  in  interval  arithmetic  results  in  an 
erval  inclusion  F  :  IR  — ♦  IR  of  /  ]8],  which  has  the  property 

2)  f{X)  =  {/(x)  ]  x  €  X)  C  F(X),  X  e  IR. 

noting  the  endpoints  of  an  interval  /  —  [a,  6]  by  inf  /  =  a,  sup  1  =  6,  respectively,  (3.2) 
ans  that 

3)  inf  F(X)  <  f(x)  <  sup  F(X).  x  €  X. 
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These  bounds  for  the  range  of  f(z)  over  A'  are  obtained  automatically .  with  iut  investiga¬ 
tion  of  the  minimum  and  maximum  values  of  f(x)  on  A',  and  are  furthermore  guaranteed 
(although  they  may  be  somewhat  crude)  [8- .  This  is  the  basis  of  the  self- validating  char¬ 
acter  of  interval  computation.  Furthermore,  interval  extensions  obtained  by  using  interval 
arithmetic  are  monotone  in  the  sense  that  X  C  V'  imples  that  F(X)  C  F(Y).  In  exact 
arithmetic,  F  is  an  extension  of  /  in  the  sense  that  jF{jx,x])  =  f(x)  for  x  (E  R  1 8  j .  In 
what  follows,  x  will  be  used  to  denote  the  degenerate  interval  [x,xj  6  IR  as  well  as  the 
real  number  x.  Other  handy  notations  to  be  used  from  time  to  time  are 

(3.4)  w[I)  -  io([a, 6])  =  6  -  a,  m(7)  =  m([a,6])  =  — 

for  the  width  and  midpoint,  respectively,  of  an  interval  /  6  IR. 

Just  as  in  the  case  of  differentiation  arithmetic,  interval  arithmetic  can  be  extended  to 
include  various  standard  functions  encountered  in  applications.  Efficient  implementations 
of  interval  arithmetic  and  interval  inclusions  of  standard  functions  are  now  available  in  a 
number  of  computational  environments,  for  example,  Pascal-SC  for  microcomputers  and 
the  ACRITH  package  for  IBM  370  computers. 

The  space  IR”  of  interval  vectors  X  =  [X j.  X2,  ■  ■  ■ ,  X„)  is  defined  in  the  same  way  as 
R”,  and  the  notions  of  interval  matrices  and  vector  and  matrix-vector  interval  arithmetic 
arise  in  a  natural  way.  The  interval  scalar  product  of  interval  vectors  AM'  is  defined  to  be 

(3.5)  X  V  =  {|>,y.  |  X,  6  X,,y,  £  V, ), 
and  the  notation  m( X)  will  be  used  for  the  midpoint 

(3.6)  m(A')  --  (m(X,).m(A'2) . m(A\,)) 

of  the  interval  vector  A'. 

Now.  if  the  evaluation  of  the  function  (2.14)  is  performed  in  interval  arithmetic  with 
X]  0.9,  1.2  ,  -  0 .8.  1.1  ,  then  the  result  is 

(3.7)  /(.V)  0.0.41.0. 


t 


X  denotes  the  interval  vector  X  =(j0.9, 1.2], [0.8, 1-1  j).  This  means  that 

0  <  f{x)  <  41 

.<  x,  <  1.2,  0.8  <  x2  <  1.1.  Thus,  the  bounds  (3.8)  are  obtained  automatically, 
by  evaluation  of  (2.14)  in  interval  arithmetic,  in  much  the  same  way  that  values 
gradient  vector  and  Hessian  matrix  of  (2.14)  were  obtained  in  §2  by  the  use  of 
itiation  arithmetic.  Furthermore,  as  stated  above,  the  bounds  given  by  interval 
etic  are  guaranteed  to  be  valid. 

tie  next  step  is  to  combine  the  differentiation  arithmetic  in  §2  with  interval  arith- 
An  element  T  of  type  IHESSIAN  will  be  a  triple 

T  =  (U,U\ U"), 

17  €  IR  is  an  interval,  U'  €  IRn  is  an  interval  (column)  vector,  and  U"  is  a  symmet- 
Tval  n  x  n  matrix.  The  resulting  set  of  elements  will  be  denoted  by  IHn.  Arithmetic 
ions  in  IHn  are  defined  by  (2.2)~(2.5),  with  the  operations  inside  the  parentheses 
;d  by  the  interval  operations  (3.1).  Similarly,  operations  between  constants  c  G  IR 
tments  of  IH”  are  defined  by  (2.8)— (2. 13)  and  the  corresponding  interval  operations, 
instants  c  are  mapped  into  IR  by  c  [c,  c],  as  before. 

t  example,  the  evaluation  of  (2.14)  as  type  IHESSIAN  can  be  carried  out  over  the 
Is  0.9  <  X\  <  1.2,  0.8  <  x2  <  1.1  by  setting 


i, 


=  (10  9  121  r°’0l 

V0-9, I*21, \(0,o]M,0,0]  io,o);;’ 


(f08  m  (i°.°n  (io,°)  m\\ 

•r2  —  y0-8, 1-11,  v^fi,!]  /  ,  \ fo,oj  (0,0];;- 


iult  is 

*W=  (F(X),F'(X),F"(X))  = 

139.4. 307.6) 

-128.0,58.0 


=  10.0,41.0], 


(534.0. 1410.0] 
-480.0,  -360.0] 


[—480.0,  -  360.0] 

;  200.0, 200.0] 


,v 

,V 


^here  X  denotes  the  interval  vector  X  =  (10.9, 1.2], |0.8, 1.1]).  This  gives  not  only  the 
lounds  (3.8)  for  f(x)  over  X,  but  also  the  bounds 


V/(x)  €  F'(X) 


(-139.4,307.6] 
-128.0, 58.0] 


or  the  gradient  vector  of  /,  and 


Hf{x)  e  F"(X)  = 


[534.0,1410.0!  [-480.0,-360.0] 

-480.0,-360.0]  [200.0,200.0] 


or  the  Hessian  matrix  of  /  over  X.  These  bounds,  obtained  automatically  by  the  use  of 
[HESSIAN  arithmetic,  are  guaranteed. 

In  addition  to  bounds  for  the  values  of  /  and  its  derivatives  on  X,  the  IHESSIAN 
imputation  (3.11)  provides  information  about  the  continuity  of  /  and  V/  on  X.  For  an 


interval  I  —  [a,  b]  e  IR,  let 


(3.14) 


|/|  =  |  [a,  6]  |  =  max{ja|,  |6|}. 


If  X  -  (Xi,Xi,  •  •  • ,  Xn)  is  an  interval  vector,  then  ||X||  will  denote  the  quantity 


(3.15) 


|X||  =  max  |X,|, 


and  for  an  n  x  n  interval  matrix  M  —  (A/tJ),  let 


(3.16) 


IIAfll  =  max  ^  jA/t;  i, 


analogously  to  the  oo-norm  in  Rn  [8].  If  the  IHESSIAN  value  of  a  function  /  :  Rn  — >  R 
over  X  6  IR"  is  denoted  by  4>(X')  =  (F(X),  F'(X),  F"(X)),  then  the  existence  of  F'(X) 
implies  that  /  is  Lipschitz  continuous  on  X,  and  L  =  [|F'(X)||  is  a  Lipschitz  constant  for 
/  on  X.  Similarly,  the  existence  of  F"(X)  implies  that  V/  is  Lipschitz  continuous  on  X, 
and  F"(X)!j  is  a  Lipschitz  constant  for  V/  on  X.  Thus,  for  the  function  (2.14),  it  follows 
from  (3.11)  that 


(3.17) 


/(*)  -  f(y)  <  307.6  •  i  -  y1  cx. 


D.5) 


/  [  —  4.802  x  10"9, 1.242  x  10~8]\ 

V  [-6.200  X  10-9, 2.400  x  10  9]  /  ’ 


d 

0.6) 

"2(X*) 


[801.999999987,802.000000030; 


j  -400.000000004 ,  -  399.999999998] 
200 


M 

V  [-400.000000004,  -399.999999998] 
he  midpoint  of  X*  was  calculated  to  be  x  =  (1,1),  with  Fz{x)  =  0,  F, (x)  =  ([[),  and 
">(*)  =  ( 


e  Hessian  matrix  F' 


802  —.00 


-400  200 

x*,  /2(^’),  X/2(x*),  and  Hf2(x*)  by  the  above. 


,  which  are  validated  to  be  the  exact  values 


The  results  in  three  dimensions  were  completely  similar,  with  the  midpoint  x  =  (l,  1, 1) 
'  X*  giving  the  exact  values  /3(x)  =  0,  V/3(x)  =  anc^ 


0.7) 


802  -400  0 

Hf3(x)  =  |  -400  1002  -400  |  . 

0  -400  200 


The  three-humped  camel  function  g(x)  given  by  (10.2)  has  five  critical  points: 


0.8) 


=  (0,0), 


hich  is  its  global  minimum  point, 

0.9)  =  ±  (\J2.l  -  v/0.865,  l-  \/ 2.1  -  , 


hich  are  saddle  points,  and 

0.10)  ±2*  =  ±  2.1  +  \/0.865,  ^2.1  -f  n/0.865), 


hich  are  relative  minimum  points.  One  has 


0.11)  0  =  g(x *)  <  g(-z*)  =  9(2*)  <  g(-y-)  g(y*). 

The  search  for  all  critical  points  was  conducted  in  the  inital  region 

0.12)  X0  =  ([-2.0,1.81,  [-0.9, 1.0’), 
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/  v  .r  /. 


'he  modification  of  the  program  to  search  for  a  global  critical  minimum  gave  the 
ing  results: 


Case  1  Case  2  Case  3  Case  4 


Transformations 

92 

140 

346 

219 

Rejected 

75 

110 

169 

172 

Critical  Points 

1 

1 

1 

1 

Transformations  to  Locate 

83 

103 

330 

201 

?  results  show  a  considerable  improvement  over  the  search  for  all  critical  points.  Once 
obal  minimum  value  has  been  found,  remaining  regions  are  generally  rejected  quickly, 
dgorithm  was  very  effective  for  the  largest  problem  considered,  Case  4  above. 

The  required  increase  in  minimum  function  values  in  the  search  for  a  global  critical 
mum  forced  the  algorithm  toward  the  boundary  of  Xo,  where  regions  were  quickly 
ed.  The  corresponding  results  were: 


Case  1 

Case  2 

Case  3 

Case  4 

Transformations 

9 

16 

27 

37 

Rejected 

9 

16 

28 

38 

Critical  Points 

0 

0 

0 

0 

?  last  two  cases,  the  final  regions  were  rejected  without  having  to  perform  a  Krawczyk 
formation. 

n  two  dimensions,  the  interval  iteration  to  the  critical  point  z*  converged  to 
)  X*  (0.999999999999.1.00000000001),  [0  999999999999,1.00000000001]) 

F2(X*)  =  0.00.9,62  x  lO-20], 


The  Rosenbrock  function  (10.1)  has  the  global  minimum  /n(x*)  =  0  at  the  critical 
point  x*  =  e  =  (1, 1, . . . ,  1).  It  is  easy  to  find  x*  by  Newton’s  method,  but  methods  which 
try  to  reduce  /n(x)  at  each  step  find  this  function  rather  difficult,  particularly  in  higher 
dimensions.  Four  cases  were  considered: 

1.  n  =  2,  X0  =  ([0.9, 1.2], [0.8, 1.1]); 

2.  n  =  2,  Xo  =  ([-3.7, 1.4], [-1.6, 3.5]); 

3.  n  =  3,  X0=  ([0.9, 1.2],  [0.8, 1.1],  [0.9, 1.2]); 

4.  n  =  3,  Xo  =  ([-3.7, 1.4], [-1.6, 3.5], [-3.7, 1.4]). 

The  value  f  =  0  was  taken  in  each  case,  and  no  exceptions  occured  in  any  of  the 
examples  given  here.  Searching  for  all  critical  points  gave  the  following  results: 


Case  1 

Case  2 

Case  3 

Case  4 

Transformations 

242 

957 

553 

1976 

Rejected 

167 

685 

432 

1567 

Critical  Points 

1 

1 

1 

1 

Transformations  to  Locate 

128 

187 

328 

1672 

The  algorithm  was  very  busy  in  the  neighborhood  of  the  critical  points  x*  =  (1,1)  and 
x*  =  (1, 1,1)-  The  region  in  which  (5.5)  holds  turned  out  to  be  rather  small,  and  nearby 
regions  not  containing  x*  had  to  be  made  very  small  before  they  could  be  rejected  with 
certainty.  The  increase  in  area  of  Xo  by  a  factor  of  289  between  cases  1  and  2  increased 
the  number  of  Krawczyk  transformations  required  by  a  factor  of  less  than  four,  while  the 
increase  in  volume  of  Xo  between  cases  3  and  4  by  a  factor  of  4913  resulted  in  an  even 
smaller  increase  in  the  number  of  transformations,  less  than  3.6.  Going  from  two  to  three 
dimensions  increased  the  number  of  transformations  required  to  search  the  entire  initial 
region  by  a  factor  of  about  two  in  each  case. 


0.  Numerical  Examples 

he  functions  selected  for  numerical  computation  were  the  n-dimensional  Rosenbrock  func- 
on  [1], 

n  -  1 

10.1)  fn(x)  =  ^[l00(xJ+i  -x,2)2-k  (1  -  xt)2j, 

t=i 

nd  the  “three-humped  camel”  function  [3], 

10.2)  g(z)  =  2x2  -  l.OSxf  +  \x\  -  ZjX2  x\- 

6 

The  program  used  was  written  in  Pascal-SC  for  a  microcomputer  with  a  Z80  processor 
nd  the  CP/M  operating  system.  This  was  done  to  take  advantage  of  support  for  interval 
rithmetic,  an  already  written  library  of  operators  and  functions  for  type  IHESSIAN,  and 
le  utility  procedure  LGLI  for  solving  linear  systems  with  interval  coefficient  matrices  and 
ght  sides.  On  the  other  hand,  the  small  amount  of  storage  available  in  this  machine  (64 
ilobytes)  limited  the  values  of  n  for  the  Rosenbrock  function  (10. 1)  to  n  =  2,3.  The 
ctual  machine  used  was  also  rather  slow,  with  a  1MHz  system  clock,  giving  typical  times 
>r  floating-point  interval  addition  and  subtraction  of  13.5  milliseconds,  multiplication, 
7.5  milliseconds,  and  division,  77.5  milliseconds.  Nevertheless,  the  results  given  below 
ere  obtained  in  a  reasonable  amount  of  time. 

The  most  time-consuming  part  of  the  computation  is  the  performance  of  the  Krawczyk 
ansformation  K(X)  (actually,  K(X)),  using  the  Pascal-SC  utility  program  LGLI  to  solve 
le  system  (5.1)  with  interval  coefficient  matrix  and  right  side.  A  count  is  made  of  the 
umber  of  times  this  transformation  is  performed,  the  number  of  critical  points  found, 
re  number  of  regions  rejected,  and  the  number  of  regions  (if  any)  in  which  exceptions 
re  encountered.  The  sum  of  the  number  of  regions  rejected  (which  cannot  contain  crit- 
al  points),  the  number  of  regions  in  which  critical  points  are  found,  and  the  number  of 
(ceptional  regions  gives  the  total  number  of  subregions  examined.  The  Krawczyk  trans- 
•rmation  may  be  applied  to  a  given  region  several  times  before  it  is  accepted  as  containing 
critical  point,  or  rejected. 


-  ■  »•  '>-  H-'V  *.  «».  **“  to 
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points  which  are  not  critical,  an  alteration  has  to  be  made  in  the  above  procedure.  The 
value  of  m(t)  is  updated  only  when  regions  X*  containing  critical  points  x*  are  computed 
by  interval  iteration.  If  supF(x*)  <  m(t  -  1),  then  we  set  m(<)  =  supF(x*),  otherwise 
m(t)  —  m(t-l).  The  rejection  criterion  inf  F(Z)  >  m(t)  remains  unaltered.  The  algorithm 
will  generally  be  slower  than  the  one  given  above  in  this  case,  but  usually  still  faster  than 
an  exhaustive  search  for  all  critical  points. 

In  the  same  way,  a  function  M(t)  giving  the  lower  bound  for  the  global  maximum 
M  of  /  is  constructed  by  setting  M( 0)  =  -MAXREAL,  and  updating  by  M(t)  to  be 
the  maximum  of  M{t  —  1)  and  infF(x)},  assuming  that  the  global  maximum  is  critical. 
Intervals  are  rejected  if  supF(X)  <  M(t).  Otherwise,  M(t)  is  updated  only  at  critical 
points,  as  above.  The  modification  of  the  choice  algorithm  for  bisected  intervals  is  done 
by  reversing  inequality  signs  and  interchanging  infs  with  sups  in  the  above. 

9.  Use  of  the  Algorithm  for  Validation 

In  addition  to  its  use  for  global  searching,  the  algorithm  given  in  §6  can  be  used  to  validate 
solutions  to  optimization  problems  given  by  other  algorithms.  For  example,  suppose  that 
x  is  an  approximate  critical  point  of  /  found  by  Newton’s  method  or  some  other  numerical 
technique.  Then,  the  initial  region  X  can  be  taken  to  be,  say 

(9.1)  X  =  x+  |  -  c  -  [-1, 1], 

where  b  >  0  and  e  =  (1,1,. ..,1)  6  Rn  denotes  the  vector  with  all  components  equal  to  one. 
If  (5.5)  holds  for  this  value  of  X ,  then  all  components  of  x  are  validated  to  be  accurate  to  a 
tolerance  of  6.  Furthermore,  the  interval  iteration  (6.1)  will  give  approximations  of  possibly 
increased  accuracy  for  the  critical  point,  as  well  as  bounds  the  for  function,  gradient  vector, 
and  Hessian  matrix  values.  It  should  be  noted  that  the  interval  calculations  furnish  upper 
and  lower  bounds  for  maximum  and  minimum  values  of  the  function,  while  some  other 
methods  give  only  one-sided  bounds. 


S. 
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Successful  termination  of  the  computer  program  will  be  accompanied  with  a  list  of 
the  number  of  regions  processed  (given  by  the  number  of  times  the  Krawczyk  transforma¬ 
tion  was  performed),  the  number  rejected,  and  the  number  of  critical  points  found,  and 
the  number  output  to  the  exception  file.  Given  all  its  critical  points,  the  global  critical 
maximum  and  minimum  of  the  function  can  be  found  simply  by  sorting  the  function  values. 

8.  Global  Critical  Extrema 

The  algorithm  of  §6  can  be  speeded  up  if  only  the  global  critical  maximum  or  mininum  of 
/  on  X  is  desired.  The  modification  of  the  algorithm  to  find  the  global  critical  minimum 
will  be  described;  finding  the  global  critical  maximum  follows  exactly  the  same  pattern. 

Suppose  first  that  the  global  critical  minimum  of  /  on  X  is  actually  its  global  minimum 
on  A',  that  is,  f(x)>m  —  f{x*)  for  x  €  A',  where  x*  is  the  global  critical  minimum  point. 
The  algorithm  will  compute  a  decreasing  sequence  of  upper  bounds  m(t )  for  m,  and  reject 
subregions  Z  such  that  inf  F(Z)  >  m(t).  The  modifications  of  the  corresponding  steps  for 
the  single-processor  algorithm  are: 

lc.  Set  m(0)  =  MAXREAL,  the  largest  floating-point  number  (for  example,  in  Pascal- 
SC.  M  AXRE.4L=9.99999999999x  10"). 

2  .  If  sup  F(x)  <  m(t  -  1),  then  set  rn(t)  =  sup  F(x),  otherwise,  m(t)  =  m(t  —  1). 

(>  .  Reject  Zl  if  0  £  F'(Zl)  or  inf  F(Zl)  >  m(t)\  reject  Zr  if  0  g1  F'(Zr )  or  inf  F(Zr)  > 
m(t).  If  neither  Zl  nor  Zr  can  be  rejected,  then  Zl  is  considered  to  be  more  promising 
if  inf  F(Zl)  <  inf  F(Zr),  and  ZT  is  stacked,  or  conversely.  In  case  inf  F(Zl )  =  inf  F(Zr), 
then  Zr  is  stacked  if  supF(Zr)  >  supF(Z*),  otherwise.  Zl  is  stacked. 

Considerable  savings  in  computer  time  have  been  observed  due  to  the  introduction  of 
the  additional  rejection  conditions  in  6  .  In  the  case  of  multiprocessors,  an  efficient  way  to 
share  the  current  value  of  rn(t)  is  necessary,  and  the  rejection  of  regions  in  which  function 
values  are  too  large  would  be  carried  out  in  step  1  . 

In  case  the  function  /  can  attain  smaller  values  than  w  on  the  boundary  dX  of  A'  at 


7.  Exceptions 

Several  exceptions  can  arise  in  the  execution  of  the  algorithm  in  §6  which  could  terminate 
the  computation  prematurely,  or  cause  it  to  run  indefinitely.  These  and  the  way  they  are 
handled  will  be  discussed  now,  because  they  may  also  occur  in  the  search  for  global  critical 
extrema. 

1°.  If  F"(x)  contains  a  singular  or  badly  conditioned  matrix,  then  the  attempt  to 
perform  the  Krawczyk  transformation  by  solving  (5.1)  will  fail.  One  solution  is  to  replace 
F"{x )  by  some  nonsingular  matrix,  for  example,  m(F'f(X))  could  work  [9].  The  implemen¬ 
tation  used  for  the  examples  given  below  simply  outputs  X  to  a  file  for  later  examination, 
with  an  appropriate  message,  and  then  selects  the  next  region  to  be  processed  from  the 
stack. 

2°.  The  intersection-bisection  process  can  lead  to  regions  which  do  not  differ  from 
the  previous  ones,  because  of  outward  rounding,  or  which  are  so  small  that  total  time 
to  explore  the  entire  region  is  prohibitive.  This  can  happen,  in  particular,  if  a  critical 
point  lies  exactly  on  a  bisection  coordinate.  For  this  reason,  the  user  is  provided  with  a 
parameter  €  such  that  if  the  volume  V  of  the  region  to  be  processed  satisfies 

(7.1)  V<e-V0, 

then  the  region  will  be  output  to  a  file  for  later  examination,  with  an  appropriate  message. 

The  choice  <  =  0  is  permitted;  this  allows  the  processing  of  smaller  and  smaller  regions 
until  their  volume  (6.9)  underflows  to  0  or  some  coordinate  becomes  degenerate. 

3°.  If  the  storage  space  allotted  to  the  stack  is  full,  then  additional  regions  will  be 
output  to  a  file. 

4°.  Numerical  exceptions,  such  as  division  by  zero  and  overflow,  are  allowed  to  termi¬ 
nate  the  present  program.  However,  they  could  be  used  as  signals  to  output  the  offending 
region  to  a  file  with  an  appropriate  message. 


In  the  case  of  a  single  processor,  the  choice  of  which  interval  to  stack  in  step  6° (tv) 
will  be  modified  in  the  case  global  critical  maxima  or  minima  are  sought.  If  only  one 
critical  point  is  sought,  the  algorithm  is  terminated  at  the  end  of  step  3°.  Otherwise,  the 
algorithm  can  continue  until  all  critical  points  of  /  in  X  are  found,  and  no  regions  remain 
on  the  stack  to  process.  Complete  processing  of  X  without  finding  critical  points  proves 
that  it  contained  none. 

Because  the  processes  of  intersection  and  bisection  can  result  in  subregions  of  a  wide 
range  of  sizes,  a  simple  count  of  the  number  processed  at  any  given  time  does  not  give  a 
good  indication  of  the  progress  being  made  by  the  algorithm.  For  this  reason,  it  has  been 
found  convenient  to  compute  the  initial  volume 

n 

(6.9)  Vo-liW 

i=l 

of  the  region  X  =  (Xi,  X2, .  • . ,  Xn)  to  be  searched.  The  unexplored  part  of  the  initial 
region  has  volume  Vu(t)  at  time  t,  where  V (0)  =  Vo-  Fu(t)  can  be  computed  simply  as 
the  sum  of  the  volumes  of  the  intervals  being  processed  and  those  on  the  stack  awaiting 
processing  at  time  t.  Vu(t)  is  a  monotone  decreasing  function  of  f,  and  the  algorithm 
terminates  when  the  stack  is  empty  and  Vu(t)  -  0,  if  an  exhaustive  search  is  desired. 


and  one  takes 


Zl  —  (Z%, . . . ,  Zj-\,  [inf  Zj,  tn(Zj)],  Zj+\,. . . ,  Zn), 

(6.8) 

ZT  =  (Zj, . . . ,  Zj-\,  [m(Zj),sup(Zj)J,  Zj+ 1, . . . ,  Zn). 

6°.  (Single  processor)  Compute  $(Z*),  $(Zr).  The  test  (4.1)  is  applied  to  F'(Zl)  and 
F'(ZT).  There  are  four  cases: 

(i)  If  0  £  F'{Zl)  and  0  £  F'(Zr),  then  X  is  rejected; 

(ii)  If  0  €  F'(Zl)  but  0  £  F'(ZT),  then  return  to  step  2°  with  X  =  Z*; 

(iii)  If  0  F'(Zl )  but  0  6  F'(Zr),  then  return  to  step  2°  with  X  =  Zr; 

(iv)  If  0  €  F'(Zl)  and  0  6  F'{ZT),  then  one  of  the  interval  vectors  is  to  be  placed  on  a 

push-down  (last  in,  first  out)  stack  in  storage,  while  the  other  replaces  X  for  continued 
processing  at  step  2°.  In  this  algorithm,  the  choice  is  made  by  the  following  heuristic: 
If  w(F{Z1))  <  w(F(Zr)),  then  one  takes  X  =  Zl  and  stacks  Zr  for  processing  later; 
otherwise,  one  takes  X  =  ZT  and  stacks  Zl. 

The  region  selected  for  X  is  considered  to  be  “more  promising”  than  the  one  stacked 
because  the  variation  of  a  function  in  the  neighborhood  of  a  critical  point  is  asymptotically 
less  than  it  is  elsewhere.  The  goal  is  to  find  critical  points  as  quickly  as  possible,  particularly 
if  only  one  is  desired. 

6°.  (Multiprocessors)  Zl  is  sent  to  another  processor  following  the  bisection  (6.8). 
Return  to  step  1°  with  the  current  processor  taking  X  =  Zr,  while  the  other  takes  X  =  Zl. 
If  no  processors  happen  to  be  free,  then  Zl  circulates  or  is  put  on  a  common  stack  to  await 
the  first  available  processor.  If  a  number  of  processors  are  free,  it  could  be  expeditious  to 
decompose  X  into  more  than  two  subregions,  and  then  send  one  to  each  processor.  The 
choices  here  will  depend  to  a  great  extent  on  the  multiprocessor  configuration  actually 
used. 
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2°.  Compute  $(x)  =  (F(z),  F'(z),  F"(z))  in  IHESSIAN  arithmetic.  Compute  an 
inclusion  h  (X)  of  the  Krawczyk  transformation  of  X  by  (5.1). 

3°.  If  K(X)  C  X,  then  the  interval  iteration 

(6.1)  X°  =  X,  Xn+1  -  K{Xn)  n  Xn 
is  performed  until  it  converges  to 

(6.2)  X*  =  X*  C  X*+1, 
in  a  finite  number  of  steps  [12].  The  values 

(6.3)  X*,  4>(X*)  =  (F(X*),F'(X*),F"(X*)), 

are  output.  The  existence  of  a  critical  point  z*  of  /  in  X  is  guaranteed,  and  furthermore 
the  bounds 

(6.4)  X*  €  X',  (/(**),  V/(x*),  e  (Fix’),  F'(X‘),F"(X')), 

for  x*  and  the  values  of  the  function  /,  its  gradient  vector  V/,  and  its  Hessian  matrix  Hf 
at  x*.  These  bounds  are  usually  as  good  as  can  be  obtained  by  floating-point  computation, 
and  F"(X*)  can  be  used  to  determine  the  nature  of  the  critical  point  z*,  if  necessary. 

4°.  If 

(6.5)  k(X)  n  X  =  0, 
then  X  is  rejected. 

5°.  In  the  indeterminate  case,  the  region 

(6.6)  Z  =  =  k(X)  n  X 

is  bisected  in  the  direction  of  its  widest  component.  An  index  j  is  determined  such  that 


(6.7) 


w(Zj)  >  w(Z,), 


t  =  1,2 . n, 


Furthermore, 


t 

k 


(5.5)  K(X)  C  X 


J 

/ 


(5.6)  K(X)  n  X  =  0 

thus  imply  (4.5)  or  (4.6),  respectively.  In  this  way,  existence  or  nonexistence  of  a  critical 
point  of  /  in  X  can  be  established  conclusively  by  a  computation  done  in  floating-point 
interval  arithmetic  by  a  computer,  providing  it  is  possible  to  obtain  an  inclusion  for  the 
solution  of  the  system  (5.1).  In  actual  practice,  the  widths  of  the  components  of  F"(x ) 
will  be  small,  and  good  inclusions  of  Y  can  be  obtained  by  a  process  of  floating-point 
approximation  followed  by  interval  iterative  refinement  [15],  which  will  fail  only  if  F"(x) 
contains  a  singular  or  very  badly  conditioned  matrix.  Interval  linear  system  solvers  of  this 
type  are  available  in  Pascal-SC  and  the  ACRITH  package. 


6.  The  Basic  Algorithm 

The  algorithm  described  in  this  section  will  find  one  or  all  the  critical  points  of  a  function 
/  in  a  given  initial  region  X ,  or  show  that  X  contains  no  critical  points,  provided  no 
exceptions  arise.  Exceptions  will  be  discussed  in  a  later  section.  The  computer  program 
implementing  this  algorithm  handles  exceptions  in  such  a  way  that  the  computation  always 
terminates  in  a  finite  number  of  steps.  Validated  upper  and  lower  bounds  are  given  for  all 
critical  points  and  values  found.  The  algorithm  will  be  presented  first  for  the  case  of  a  single 
processor,  in  the  way  it  has  actually  been  implemented.  Adaptation  to  a  multiprocessor 
environment  will  be  discussed  at  the  end  of  the  section. 

The  basic  steps  of  the  algorithm  are: 

1°.  Compute  4>(X)  =  (F(X),  F'(X),  F"(X))  in  IHESSIAN  arithmetic.  If  0  £  F’{X), 
then  X  is  rejected. 
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make  use  of  the  fact  that  an  optimization  problem  underlies  the  system  of  equations  being 
solved,  which  provides  information  additional  to  that  inherent  in  an  arbitrary  system  of 
nonlinear  equations.  Furthermore,  the  algorithms  given  here  differ  by  bisecting  intersected 
intervals  in  the  inconclusive  case,  which  results  in  a  certain  amount  of  increase  in  speed. 

The  use  of  subregions  has  the  advantage  that  the  tests  (4.1)  and  (4.5)-(4.6)  become 
more  sensitive  as  the  size  of  the  region  decreases,  that  is,  as  ||tu(X)||00  — >  0.  In  fact,  if 
x*  is  a  regular  critical  point  of  /,  that  is,  if  (f//(x*))-1  exists,  then  (4.5)  will  hold  for 
sufficiently  small  X  such  that  x*  £  X  if  V/  is  Lipschitz  continuous  [10].  The  disadvantages 
are  the  extra  bookkeeping  and  storage  required  for  pending  subregions.  However,  these 
are  not  overwhelming  on  modern  computers. 

5.  Implementation  of  the  Krawczyk  Transformation 

The  system  of  equations  (4.3),  as  stated,  has  the  real  coefficient  matrix  H  f(x),  interpreted 
as  a  degenerate  interval  matrix,  and  an  interval  right  side.  In  actual  computation,  instead 
of  solving  (4.3),  one  obtains  an  inclusion  5  of  the  solution  Y  of  the  system 

(5.1)  F"{x)Y  =  -F'(x)  +  {F"{x)  -  F"(X)}(Y  -  x), 

which  has  an  interval  coefficent  matrix  and  an  interval  right  side.  The  solution  of  such  a 
system 

(5.2)  AY  =  B 
is  defined  to  be 

(5.3)  Y  =  {y  \ay  =  b,  a  €  A,  b  e  B}, 

where  a  is  a  real  matrix,  and  6,  y  £  Rn,  provided  all  the  indicated  real  systems  are  solvable. 
In  this  case,  it  follows  that  5  C  K  (_  5,  where  E  is  the  solution  of  (4.3),  uiid  thus 


(5.4) 


K{X)  -  x  +  ECi  +  E  =  A'(.Y). 


where  I  denotes  here  the  n  x  n  identity  matrix,  and  x  =  m(X)  the  midpoint  vector  of  the 
interval  region  X.  The  real-valued  vectors  and  matrices  in  (4.2)  are  of  course  interpreted 
as  degenerate  interval-valued  objects. 

In  actual  practice,  K(X)  is  computed  by  solving  the  linear  system 

(4.3)  (tf /(*))£  =  - V/(i)  +  {///(*)  -  F"(X)  }(X  -  x) 
for  5,  from  which 

(4.4)  K{X)  =  x  +  5. 

Once  K{ X)  has  been  computed,  one  of  the  following  alternatives  holds.  If 

(4.5)  K{X)  C  X, 
then  there  exists  a  critical  point  x*  £  K(X)  of  /;  if 

(4.6)  K{X)  n  X  =  0, 

is  empty,  then  X  does  not  contain  a  critical  point  of  /  (another  rejection  criterion);  oth¬ 
erwise,  the  test  is  inconclusive  [7]. 

With  regard  to  (4.6),  the  intersection  of  interval  vectors  X,  Y  is  said  to  be  empty  if 
for  some  t,  X*  and  Yt  are  disjoint  intervals.  It  also  follows  from  (4.2)  that  if  x*  £  X  is  a 
critical  point  of  /,  then  x*  6  X(X).  Thus,  in  the  inconclusive  case,  the  region 

(4.7)  Z  =  X(1  K(X)  C  X 

will  also  contain  any  critical  points  of  /  which  lie  in  X.  This  suggests  decomposing  Z  (which 
may  be  equal  to  X)  into  several  subregions,  and  applying  the  above  tests  to  each.  The 
resulting  algorithms,  described  in  more  detail  below,  are  essentially  modifications  of  the 
one  given  by  Moore  and  Jones  [9]  for  locating  solutions  of  systems  of  nonlinear  equations 
in  several  variables.  These  algorithms  differ  from  the  Moore-Jones  method  in  that  they 


«.T- ; 


«•_  W ’  «f%  *r_  -  , 


-  ■.  \  *.  V  V  ■. 


b 


a 


and 


(3.18) 


||V/(x)  -  V/(y)||oo  <  1890.0  ■  ||x  -  y| 


for  x,y  E  X  =  ([0.9, 1.2],  (0. 8, 1.1])  [14].  The  Lipschitz  continuity  of  V /  will  enter  into  the 
discussion  later. 

In  actual  practice,  the  arithmetic  operators  and  standard  library  functions  for  type 
IHESSIAN  can  be  programmed  once  and  for  all,  and  stored  in  a  small  subroutine  library, 
as  has  been  done  in  Pascal-SC  [13]. 

4.  Tests  for  Existence  or  Nonexistence  of  Critical  Points 

The  key  issue  in  the  algorithm  described  in  this  paper  is  to  determine  if  a  region  in  Rn 
defined  by  an  interval  vector  X  E  IRn  contains  a  critical  point  of  the  function  /  :  Rn  — ►  R 
or  not.  First  of  all,  if 


(4.1) 


0  #F'(X), 


then  it  is  impossible  that  V /(x)  =  0  for  x  E  X,  and  X  can  be  rejected,  since  it  does  not 
contain  a  critical  point  of  /  [7].  On  the  other  hand,  0  E  F'(X)  does  not  necessarily  mean 
that  X  contains  a  critical  point  of  /,  because  F'(X)  overestimates  V  f(X)  in  general.  The 
intersection  of  all  interval  vectors  containing  f(X)  is  called  the  interval  hull  of  /(X).  In 
several  dimensions,  the  interval  hull  of  /(X)  can  contain  points  outside  of  /(X)  in  general 
[81- 

In  addition  to  the  rejection  criterion  (4.1),  a  test  which  is  capable  of  establishing  the 
existence  of  a  critical  point  x*  in  X  is  necessary.  For  this  purpose,  the  test  given  by  Moore 
[m3]  will  be  used.  This  test  is  based  on  the  application  of  the  Krawczyk  transformation  K 
[5]  to  X: 


(4.2) 


K(X)  =  x  -  (H/(x))'1V/(x)  4  {/  -  (///(x))~V"(X)}(X  -  x), 
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with  £  =  0.  The  total  number  of  Krawczyk  transformations  required  was  82,  48  intervals 
were  rejected,  all  5  critical  points  were  found,  and  there  were  no  exceptions.  This  function 
is  less  of  a  computational  challenge  than  the  Rosenbrock  function  /^(z);  however,  the 
critical  points  ±y*  and  ±z*  tend  to  shield  x*  from  straightforward  iterative  procedures, 
such  as  Newton’s  method.  Letting  T(-)  denote  the  number  of  Krawczyk  transformations 
required  to  locate  a  given  critical  point,  the  results  were: 

T(x")  =  6, 

T(y*)  =  21, 

(10.13)  T(z*)  =  45, 

T(-y*)  =  69, 

T(-z*)  =  79. 


The  search  for  the  global  minimum  of  g{x)  in  X0  required  60  Krawczyk  transfor¬ 
mations,  46  intervals  were  rejected,  2  critical  points  were  located  in  order  of  decreasing 
function  value,  and  there  were  no  exceptions.  The  critical  points  found  were  first  - z *  and 
then  the  global  minimum  point  x*,  with 

T{-z*)  =  10, 

(10.14) 

T{x*)  =  57. 

Obviously,  the  search  took  an  entirely  different  path  than  the  exhaustive  search  (10.13) 
for  all  critical  points  of  g(x)  in  Xq. 

The  search  for  the  global  critical  maximum  of  g(x)  was  somewhat  slower,  due  to  the 
fact  that  g(x)  attains  it  maximum  at  a  noncritical  point  on  the  boundary  dX0  of  X0- 
Consequently,  the  function  M(t)  which  gives  a  lower  bound  for  the  critical  maximum  was 
updated  only  at  critical  points.  This  computation  required  76  Krawczyk  transformations. 
50  intervals  were  rejected,  and  three  critical  points  (x*.  y*.  and  -y*)  were  found  in  order 


of  nondecreasing  function  values.  The  number  of  transformations  required  were: 


T{x*)  =  6, 

(10.15)  TV)  =  21, 

T(-y*)  =  64. 

The  critical  points  ±z*  were  also  located,  but  rejected,  because  the  values  of  g(x)  at 
these  relative  minimum  points  is  smaller  than  at  the  saddle  points  ±y*.  Investigation  of 
the  nature  of  critical  points  is  done  by  finding  bounds  for  the  eigenvalues  of  all  symmetric 
matrices  A  such  that  A  G  G"(X*),  using  the  Pascal-SC  procedure  EIGEN.  Denoting  the 
eigenvalues  of  a  2  x  2  symmetric  matrix  A  by  Ai(.A)  and  A2(v4),  then  intervals  Ai(X*)  and 
A2(X*)  are  '•omputed  by  EIGEN  such  that 

(10.16)  (A,(A)  |  A  G  G"(X*)}  C  A,(X*),  t  =  1,2. 


The  character  of  the  critical  point  x*  G  X*  can  be  decided  on  the  basis  of  these  bounds. 
The  results  of  the  inter  val  iteration  to  critical  points  were: 

X*  =  ([-2.0  x  10~",2.0  x  1(T99),  [-2.0  x  10“",2.0  x  10“"]), 

G(X*)  =  [-2.05  x  10-99,5.00  x  10-"1 

r'lY*\  -  /  [-1-52  X  10-98, 1.52.  10- 
[  \  [-6.00  x  10-",6.00  x  10" 


(10.17) 


r98]  x 
)-"] )  ’ 


7"(X*)  =  ( 


3.99999999999,4.00000000000]  - 


')■ 


(10.18) 


-1  2 

The  eigenvalues  of  Hg( x*)  are  contained  in  the  intervals 
Aj(X*)  =  [4.142135623,4.142135625}, 

A2(X*)  =  [1.58578643759,1.58578643766], 

which  proves  conclusively  that  the  critical  point  x*  G  X*  is  a  minimum  point,  because 
both  eigenvalues  of  Hg(x*)  G  G"(X*)  must  be  positive  [1].  The  midpoint  of  X*  is  x  = 
x *  =  (0,0),  with  G(i)  =  g(x*)  =  0,  G'(x)  =  Vg{x*)  -  ^q)'  and  G"^  =  #£(**)  = 

(-*,  v)- 
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Y*  =  ([1.07054229181, 1.07054229185],  [0.535271145904,0.535271145921]), 
G(Y*)  =  [0.877361557501,0.877361558041], 


(10.19) 


G 

G 


>(V*\  _  /  [-5.81  x  10-10,5.26  x  10' 

(  }  ~  V  I-5.« 


(10.20) 


r>°]\ 

00  x  10_11,4.00  x  10~n]  )  ’ 

.  _  /  [-3.87308929305, -3.87308929080]  -l\ 

[Y)-{  -1  2  )' 

The  eigenvalues  of  Hg(y*)  are  contained  in  the  intervals 

Aj(y*)  =  [-4.03868818,-4.03868818], 
A2(K*)  =  [2.165598876,2.165598884], 
so  that  the  critical  point  y*  6  Y *  is  indubitably  a  saddle  point. 

The  next  values  obtained  were 

Z *  =  ([1.74755234581,1.74755234586]), 

G[Z *)  =  [0.298638440884,0.298638443572], 

(10.21)  Q, 

/  [1.21530892921, 1.21530892930]  -l\ 
-1  -2)' 

The  eigenvalues  of  Hg[z*)  are  contained  in  the  intervals 

Aj  =  [5.33440787,5.33440793], 

A2  =  [2.681868136,2.681868143], 


_  / [—2.18  x  10-lo,3.82  x  10-,o]\ 

'  ’  \  [-1.00  x  10“n, 0.00]  /’ 


(10.22) 


and  thus  the  critical  point  z*  is  a  relative  minimum  point. 

The  computed  intervals 

(10.23) 

-Y*  =  ([-1.070544229185, -1.07054229181],  [-0.535271145923, -0.535271145906]) 

and 

(10.24) 

-Z*  =  ([-1.74755234585, -1.74755234580],  [-0.873776172925, -0.873776172902]) 


contain  the  critical  points  —  y*  and  —2*,  respectively.  The  function,  gradient,  and  Hessian 
values  on  these  intervals  do  not  differ  significantly  from  the  corresponding  ones  for  Y* 
and  Z*.  In  particular,  the  eigenvalues  of  lie  in  the  intervals  (10.20),  while  the 

eigenvalues  of  Hg(-z *)  belong  to  the  intervals  (10.22).  Thus,  —  y*  is  guaranteed  to  be  a 
saddle  point,  and  -2*  a  relative  minimum  point  of  g. 
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